Goto

Collaborating Authors

 Belize


Ancient Mayan water filters stopped a lot--just not mercury poisoning

Popular Science

The civilization made the most of its technology, but everything has its limits. Mayan society often relied on cinnabar, a deep red pigment that got its hue from mercury sulfide. Breakthroughs, discoveries, and DIY tips sent six days a week. A trio of ancient reservoirs in present-day Guatemala is revealing both the strength--and limitations--of Mayan water science. While the civilization's purification techniques resulted in comparatively clean drinking sources, archaeologists say the unknowable consequences of a commonly used, deep-red pigment consistently subjected the Indigenous population to toxic mercury poisoning .



A Appendix

Neural Information Processing Systems

The complete list may be seen in Table 8. Here are a few general notes about these strings: 1. Based on their recommendations, we did the following: 1. zh, zh_Latn: This resulted in the special filters described below. URLs) the corpora were in languages different from the LangID predictions. This is mainly mis-rendered PDFs and may have practical applications for denoising, or for decoding such garbled PDFs.



Chinese-linked biolab discovered in Las Vegas home sparks federal investigation with SWAT response

FOX News

This material may not be published, broadcast, rewritten, or redistributed. Quotes displayed in real-time or delayed by at least 15 minutes. Market data provided by Factset . Powered and implemented by FactSet Digital Solutions . Mutual Fund and ETF data provided by LSEG .


Democratic or Authoritarian? Probing a New Dimension of Political Biases in Large Language Models

Piedrahita, David Guzman, Strauss, Irene, Schölkopf, Bernhard, Mihalcea, Rada, Jin, Zhijing

arXiv.org Artificial Intelligence

As Large Language Models (LLMs) become increasingly integrated into everyday life and information ecosystems, concerns about their implicit biases continue to persist. While prior work has primarily examined socio-demographic and left--right political dimensions, little attention has been paid to how LLMs align with broader geopolitical value systems, particularly the democracy--authoritarianism spectrum. In this paper, we propose a novel methodology to assess such alignment, combining (1) the F-scale, a psychometric tool for measuring authoritarian tendencies, (2) FavScore, a newly introduced metric for evaluating model favorability toward world leaders, and (3) role-model probing to assess which figures are cited as general role-models by LLMs. We find that LLMs generally favor democratic values and leaders, but exhibit increased favorability toward authoritarian figures when prompted in Mandarin. Further, models are found to often cite authoritarian figures as role models, even outside explicit political contexts. These results shed light on ways LLMs may reflect and potentially reinforce global political ideologies, highlighting the importance of evaluating bias beyond conventional socio-political axes. Our code is available at: https://github.com/irenestrauss/Democratic-Authoritarian-Bias-LLMs.


Unlocking the Potential of Global Human Expertise

Neural Information Processing Systems

For example, in the Pandemic Response Challenge experiment, the context consisted of data about the geographic region for which the predictions were made, e.g., historical data of COVID-19 cases and intervention policies; actions were future schedules of intervention policies for the region; and outcomes were predicted future cases of COVID-19 along with the stringency


Language Specific Knowledge: Do Models Know Better in X than in English?

Agarwal, Ishika, Bozdag, Nimet Beyza, Hakkani-Tür, Dilek

arXiv.org Artificial Intelligence

Often, multilingual language models are trained with the objective to map semantically similar content (in different languages) in the same latent space. In this paper, we show a nuance in this training objective, and find that by changing the language of the input query, we can improve the question answering ability of language models. Our contributions are two-fold. First, we introduce the term Language Specific Knowledge (LSK) to denote queries that are best answered in an "expert language" for a given LLM, thereby enhancing its question-answering ability. We introduce the problem of language selection -- for some queries, language models can perform better when queried in languages other than English, sometimes even better in low-resource languages -- and the goal is to select the optimal language for the query. Second, we introduce simple to strong baselines to test this problem. Additionally, as a first-pass solution to this novel problem, we design LSKExtractor to benchmark the language-specific knowledge present in a language model and then exploit it during inference. To test our framework, we employ three datasets that contain knowledge about both cultural and social behavioral norms. Overall, LSKExtractor achieves up to 10% relative improvement across datasets, and is competitive against strong baselines, while being feasible in real-world settings. Broadly, our research contributes to the open-source development (https://github.com/agarwalishika/LSKExtractor/tree/main) of language models that are inclusive and more aligned with the cultural and linguistic contexts in which they are deployed.


Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages

Omnilingual ASR team, null, Keren, Gil, Kozhevnikov, Artyom, Meng, Yen, Ropers, Christophe, Setzler, Matthew, Wang, Skyler, Adebara, Ife, Auli, Michael, Balioglu, Can, Chan, Kevin, Cheng, Chierh, Chuang, Joe, Droof, Caley, Duppenthaler, Mark, Duquenne, Paul-Ambroise, Erben, Alexander, Gao, Cynthia, Gonzalez, Gabriel Mejia, Lyu, Kehan, Miglani, Sagar, Pratap, Vineel, Sadagopan, Kaushik Ram, Saleem, Safiyyah, Turkatenko, Arina, Ventayol-Boada, Albert, Yong, Zheng-Xin, Chung, Yu-An, Maillard, Jean, Moritz, Rashel, Mourachko, Alexandre, Williamson, Mary, Yates, Shireen

arXiv.org Artificial Intelligence

Automatic speech recognition (ASR) has advanced in high-resource languages, but most of the world's 7,000+ languages remain unsupported, leaving thousands of long-tail languages behind. Expanding ASR coverage has been costly and limited by architectures that restrict language support, making extension inaccessible to most--all while entangled with ethical concerns when pursued without community collaboration. To transcend these limitations, we introduce Omnilingual ASR, the first large-scale ASR system designed for extensibility. Omnilingual ASR enables communities to introduce unserved languages with only a handful of data samples. It scales self-supervised pre-training to 7B parameters to learn robust speech representations and introduces an encoder-decoder architecture designed for zero-shot generalization, leveraging a LLM-inspired decoder. This capability is grounded in a massive and diverse training corpus; by combining breadth of coverage with linguistic variety, the model learns representations robust enough to adapt to unseen languages. Incorporating public resources with community-sourced recordings gathered through compensated local partnerships, Omnilingual ASR expands coverage to over 1,600 languages, the largest such effort to date--including over 500 never before served by ASR. Automatic evaluations show substantial gains over prior systems, especially in low-resource conditions, and strong generalization. We release Omnilingual ASR as a family of models, from 300M variants for low-power devices to 7B for maximum accuracy. We reflect on the ethical considerations shaping this design and conclude by discussing its societal impact. In particular, we highlight how open-sourcing models and tools can lower barriers for researchers and communities, inviting new forms of participation. Open-source artifacts are available at https://github.com/facebookresearch/omnilingual-asr.


From Anger to Joy: How Nationality Personas Shape Emotion Attribution in Large Language Models

Kamruzzaman, Mahammed, Monsur, Abdullah Al, Kim, Gene Louis, Chhabra, Anshuman

arXiv.org Artificial Intelligence

Emotions are a fundamental facet of human experience, varying across individuals, cultural contexts, and nationalities. Given the recent success of Large Language Models (LLMs) as role-playing agents, we examine whether LLMs exhibit emotional stereotypes when assigned nationality-specific personas. Specifically, we investigate how different countries are represented in pre-trained LLMs through emotion attributions and whether these attributions align with cultural norms. To provide a deeper interpretive lens, we incorporate four key cultural dimensions, namely Power Distance, Uncertainty Avoidance, Long-Term Orientation, and Individualism, derived from Hofstedes cross-cultural framework. Our analysis reveals significant nationality-based differences, with emotions such as shame, fear, and joy being disproportionately assigned across regions. Furthermore, we observe notable misalignment between LLM-generated and human emotional responses, particularly for negative emotions, highlighting the presence of reductive and potentially biased stereotypes in LLM outputs.